Lookback
Important When a job that includes a lookback period runs, a temporary job appears in the Jobs queue. The temporary job will not resolve until all other lookback jobs are complete.
Lookback options
There are two lookback options to consider.
Option | Description |
---|---|
Union Lookback | Loads a specified number of historical runs of files on certain dates under the same dataset name and includes them in the DQ scans. |
Full File Lookback | Includes the historical context of a single file in the outlier and/or patterns scans. To string together historical files that contain a single timeslice and include a larger historical window, use union lookback instead. |
Union Lookback (-fllb)
Union Lookback, or File Lookback (-fllb) as it is also known, is used with deep learning and pattern matching. In the example below, it is used with deep learning.
File Lookback is used to check DQ Check history for previous files.
-fllb
This is often used with files and in conjunction with -adddc in cases where a date column is not in an ideal format or you do not have a date column on the given dataset.
Despite the name, this can be used with file or database storage formats.
Note File look back (-fllb) should only be used when a SQL layer is not available. This is considered for advanced use cases, but may not be suitable for all file types and folder structures. Best practice is to expose a date signature somewhere in the file or directory naming convention.
Example
-ds "demo_lookback" \
-rd "2017-07-29" \
-lib "/opt/owl/drivers/mysql" \
-cxn "mysql" \
-q "select * from lake.dateseries where DATE_COL = '2017-07-29' " \
-dc DATE_COL \
-dl \
-dlkey sym \
-dllb 4 \
-fllb
Note This look back will load your past 4 runs as your historical training set
Full File Lookback (-fullfile)
Like Union Lookback, Full File Lookback (-fullfile) is used with deep learning and pattern matching.
Fullfile Lookback uses the entire file for lookbacks instead of just filequery.
Understanding lookback command line flags
When you look at the DQ Job command line with lookback enabled, you may see several different lookback flags. The following table describes what each lookback flag means and to which DQ layers they apply.
Flag | Description | Layer |
---|---|---|
-fllb
|
Union lookback loads the DQ check history of files or database tables on certain dates under the same dataset name and includes them in the DQ Job. | Outliers and Patterns |
-fllbminrow
|
The minimum number of rows in a dataset before it is included in a file lookback. This is automatically applied when Union Lookback is selected. | Outliers and Patterns |
-dllb
|
Deep learning lookback loads DQ checks over a specified period of days and includes them in the scan for outliers. | Outliers |
-dlminhist
|
An automatically generated flag based on the outlier lookback setting -dllb that defines the minimum number of days before DQ flags data as potential outliers. -dlminhistensures that the number of days in the algorithm is relative to the total scope of the lookback period. | Outliers |
-fullfile
|
Includes the historical context of a single file in the outlier and/or patterns scans. | Outliers and Patterns |
-bhlb
|
Behavior lookback loads a specified period of past DQ checks to include in a DQ Job. This controls the baseline profiling of a dataset. | Behavior |
-fpglb
|
Pattern lookback loads DQ checks over a specified period of days and includes them in the scan for patterns. | Patterns |